home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-02-09 | 54.1 KB | 1,367 lines |
- Departement Informatik
- Institut für Computersysteme
- Eidgenössische Technische Hochschule
- Zürich
-
- Robert Griesemer
- On the Linearization of Graphs and Writing Symbol Files
-
- Cuno Pfister (ed.)
- Oberon Technical Notes
- Beat Heeb
- Josef Templ
- March 1991
-
- Address of the authors:
- Computersysteme
- ETH-Zentrum
- CH-8092 Zurich, Switzerland
-
- e-mail:
- griesemer@inf.ethz.ch
- pfister@inf.ethz.ch
-
- Copyright (c) 1991 Departement Informatik, ETH Zürich
-
- On the Linearization of Graphs and Writing Symbol Files
- Robert Griesemer
-
- Abstract
-
- The linearization of general graphs represented by pointer and record
- data structures is a problem often arising in computer programs.
- Whenever a graph has to be stored on or transmitted over a sequentially
- organized carrier, a form of linearization is used. A simple algorithm
- for this purpose is presented and a special application - the writing of
- symbol files as required by modern language compilers - is described in
- more detail.
-
- Keywords: Linearization, Graphs, Symbol Files.
-
- 1. Linearization of Graphs
-
- General graphs occur in various forms in computer science, sophisticated
- data structures may often be interpreted as graphs. Whenever such a data
- structure has to be stored on a file or has to be transmitted over a
- network, the linearization problem occurs. For special forms of
- non-linear data structures (e.g. trees) well-known solutions exist and
- are comprehensively described in the basic literature. Nevertheless, for
- more general data structures the wheel has to be reinvented and
- literature is difficult to find [ReMö86]. The problem has a twofold
- nature: firstly, the graph has to be linearized by means of a write
- algorithm , and secondly, it has to be rebuilt out of its linear
- description by a read algorithm . In the following, linearization means
- writing a graph to a file.
-
- 1.1 Preconditions
-
- First of all, we remember that every general (undirected) graph may be
- represented by a directed graph, by means of having two directed edges
- instead of an undirected one. Hence, we concentrate on directed graphs
- only. Secondly, we consider only rooted and connected graphs; i.e.
- graphs with a special root node and a path from the root to every other
- node. Unconnected graphs or graphs with several root nodes are easily
- extended such, that they obey the above preconditions. Thirdly, as an
- (academic) restriction only finite graphs are considered.
-
- In computer programs, graphs may be represented in various forms,
- depending on the available notational support (the programming language)
- and the way the data structure is used (the program). Here we consider
- only graphs described in terms of pointers and records; i.e. every graph
- node is represented by a record and every edge is represented by a
- pointer. Note that possible information attached to the edges of a graph
- ( labeled edges ) may be interpreted as additional node data.
-
- The nodes of such a graph may not all have the same type; i.e. the
- amount of data in every node may be different and the number of directed
- edges to other nodes may vary. It depends on the application and
- available language how different nodes are specified. For the sake of
- simplicity we identify each node with a positive tag -number (> 0), so
- that every node type is clearly determined by its tag and vice versa.
- Then, every node contains a certain amount M(tag) of data and (at most)
- a certain number N(tag) of directed edges to other nodes. Hence, a
- general graph node and its edges may be described in the following way
- (written in Oberon [Wi88]):
-
- TYPE
- Node = POINTER TO NodeDesc;
-
- NodeDesc = RECORD
- tag: INTEGER; (* determines node type; tag > 0 *)
- data: ARRAY M(tag) OF INTEGER; (* M depends on the node type (tag) *)
- link: ARRAY N(tag) OF Node (* N depends on the node type (tag) *)
- END;
-
- Note that any data structure represented in any way in computer memory
- may be written to a file by simply writing out sequentially the
- corresponding memory area. Reading requires "only" the data to be read
- into the same memory area (otherwise pointers would be incorrect). But
- normally this is an inappropriate solution for several reasons: often
- the data structure is distributed over the entire available memory and
- writing would lead to an immense waste of space. For reading, the
- destination memory addresses must be specified which is almost never
- possible and a file created this way is inherently not portable (to
- another computer system).
-
- Hence, writing of graphs requires pointers to be transformed into
- another form of reference and vice versa for reading. In addition,
- reading and writing should be as efficient as possible considering both
- memory and time; e.g. each node description should appear once and only
- once on the file.
-
- 1.2 The Write Algorithm
-
- The write algorithm resembles closely a naive recursive mark-algorithm
- for garbage collectors, actually it is nearly the same task to be done:
- all nodes have to be traversed and marked; marking is necessary to avoid
- a second traversal and to refer to the node's first occurence. While
- traversing the graph its structure is written to a file. Hence, the node
- data structure has to be extended by a mark field, which initially must
- be zero:
-
- TYPE
- Node = POINTER TO NodeDesc;
-
- NodeDesc = RECORD
- mark: INTEGER; (* used for linearization; initially = 0 *)
- tag: INTEGER; (* determines node type; tag > 0 *)
- data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *)
- link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *)
- END;
-
- We say a node q is marked iff q.mark > 0. The WriteNode procedure (see
- below) traverses the graph in pre-order, starting with the root node.
- Whenever an unmarked node is encountered, the node is numbered and thus
- marked (line 4), the node tag and data are written (lines 4 and 5) and
- its successor nodes are traversed (line 6). Note that a node must be
- numbered before its subtrees are traversed because they may refer to it.
- Numbering is done using the counter NofNodes which is one initially and
- is then incremented by one with every processed node. Hence, NofNodes is
- always greater than zero. Whenever an already marked node is traversed,
- only its negative node number is written out as reference number (line
- 2). If the node doesn't exist (i.e. q = NIL) zero is written out (line
- 1):
-
- VAR
- NofNodes: INTEGER; (* node number; initially = 1 *)
-
- PROCEDURE WriteNode(q: Node);
- VAR i: INTEGER;
- BEGIN
- (1) IF q = NIL THEN WriteInt(0)
- (2) ELSIF q.mark > 0 THEN (* already marked *) WriteInt(-q.mark)
- (3) ELSE (* first occurence *)
- (4) WriteInt(q.tag); q.mark := NofNodes; INC(NofNodes);
- (5) i := 0; WHILE i < M(tag) DO WriteInt(q.data[i]); INC(i) END;
- (6) i := 0; WHILE i < N(tag) DO WriteNode(q.link[i]); INC(i) END
- (7) END
- END WriteNode;
-
- During execution of WriteNode with argument root all nodes are numbered
- in the order of their first occurence. References to already written
- nodes are the negative node numbers. While reading the file this
- information will be used to reconstruct the graph (see 1.3 The Read
- Algorithm). Before WriteNode can be called a second time for the same
- graph or parts of it, the preconditions have to be established again,
- i.e. the nodes must be unmarked. When using time-stamps, no such unmark
- pass is necessary (due to an idea of J. Templ, see [PfiHeTe91]: A
- Symmetric Solution to the Load/Store Problem ). The WriteNode procedure
- leads to the following file structure (in EBNF, terminal symbols are
- described in double quotes):
-
- Graph = NodeRef | NoNode | NodeDesc.
-
- NodeRef = "negative node number (< 0)".
-
- NoNode = "0".
-
- NodeDesc = "node tag (> 0)" "node data" {Graph}.
-
- It is obvious that the algorithm terminates for all graphs: there is
- only a finite number of nodes and every procedure call works on at least
- one node. The recursion is stopped when an already marked node and hence
- a loop is found (line 2) and the algorithm returns to its predecessor
- node. In line 4 the assignment q.mark := NofNodes is executed only if
- q.mark = 0, i.e. each node is numbered at most once (because NofNodes >
- 0). On the other hand, since after numbering of the node q^ (line 4)
- WriteNode numbers all subgraphs of q^ (line 6), each node is numbered at
- least once. In combination we may conclude that each node is marked
- exactly once. Finally, because each node's data is written iff the node
- was numbered immediatly before (line 5), it becomes clear that each node
- is written only once.
-
- Note that WriteNode is a one-pass algorithm and that no additional data
- structures except NofArgs and the procedure activation stack are
- required. The run time complexity is O(n) where n is the number of edges
- in the graph. As we mentioned above, WriteNode resembles closely the
- mark phase of a mark-and-sweap garbage collector, therefore it is
- possible to transform the algorithm into a completely iterative form
- (e.g. if stack space is critical). The analoguous transformations for a
- simple garbage collector are described in [PfiHeTe91].
-
- 1.3 The Read Algorithm
-
- The read algorithm reconstructs a (sub-) tree from its description on a
- file. Each node block in the file starts with an identification tag.
- According to the EBNF file structure there are three cases to
- distinguish: the node tag may be zero, negative or positive (excluding
- zero).
-
- VAR
- NofNodes: INTEGER; (* node number; initially = 1 *)
- NodeTab: ARRAY MaxNodes OF Node; (* node table; NodeTab[0] = NIL *)
-
- PROCEDURE ReadNode(VAR q: Node);
- VAR tag, i: INTEGER;
- BEGIN
- (1) ReadInt(tag);
- (2) IF tag <= 0 THEN (* node reference *) q := NodeTab[-tag]
- (3) ELSE (* first occurence *)
- (4) NEW(q, tag); NodeTab[NofNodes] := q; INC(NofNodes);
- (5) i := 0; WHILE i < M(tag) DO ReadInt(q.data[i]); INC(i) END;
- (6) i := 0; WHILE i < N(tag) DO ReadNode(q.link[i]); INC(i) END
- (7) END
- END ReadNode;
-
- The first and second case are handled together by initializing the
- (otherwise unused) NodeTab entry zero to NIL. Because an empty subtree
- is identified by a zero tag, q then automatically becomes NIL (line 2).
- A node that occurs for the first time is identified by a positive number
- which is also its node tag. Then, such a node is created using NEW and
- stored in a global node table NodeTab (line 4). Note again that this
- assignment must be done before any subtrees are read (they may
- potentially refer to it). After creation of the new node its data and
- its subtrees are read (line 5 and 6). A negative node tag refers to an
- already read node, with the node number -tag. This node is obtained from
- the NodeTab array (line 2).
-
- It should be clear that this algorithm rebuilds the original graph, as
- each line in ReadNode has its counterpart in WriteNode and each call of
- WriteNode during writing implies a corresponding call of ReadNode during
- reading. Clearly this algorithm is also O(n) where n is the number of
- edges in the graph. A temporary node array of size n is used for
- reading, this might be a drawback in case of very large graphs. A 2-pass
- algorithm which needs temporary storage only for nodes which are
- referenced more than once is described in the appendix.
-
- 2. Writing Symbol Files
-
- Today's modular programming languages like Modula-2, Ada, Oberon and
- others allow an application to be programmed in several more or less
- independent parts, so-called modules (or packages). Such a module
- typically defines data structures and provides operations (procedures)
- on them, these exported objects may be used ( imported ) by other
- modules without knowledge of their implementation.
-
- Therefore, the information about a module's interface has to be
- available to the compiler in an efficient way: in Modula-2 and Oberon
- the necessary data is recorded on so-called symbol files. For
- historical reasons and to avoid confusion we use the term symbol file
- for any linear data structure which describes a module's interface.
- Actually, it is not necessary that symbol files are represented by a
- separate file (see 2.4 Implementation Aspects).
-
- During compilation, the information about a module's interface is held
- in the symbol table of the compiler. The symbol table can be regarded as
- a graph, hence writing a symbol file requires linearizing the necessary
- partial graph of this table. The methods described in the following have
- been implemented in a compiler for an experimental Oberon-like language
- [Gri90]. The code fragments detail only the principal structure, a
- complete implementation may be found in the appendix.
-
- 2.1 Structure of Symbol Tables
-
- The symbol table of a compiler for an Oberon-like language describes the
- compiled objects, which are constants, types, variables, procedures, or
- modules. The table itself is built when processing declarations. These
- may be nested, so the compiler has to manage a stack of scopes, i.e.
- visibility ranges of objects. For our purposes, it is sufficient to
- store the objects of a scope in a linear list. More sophisticated
- implementations would use structures like binary trees, because objects
- have to be searched efficiently during compilation. In Modula-2 and
- Oberon only global objects may be exported, i.e. only objects within the
- global scope of the symbol table have to be considered for export (Fig.
- 1). Hence, writing the symbol file means linearizing the partial graph
- described by the exported (and therefore in some way marked) objects of
- this scope.
-
- Fig. 1
-
- Global Scope
-
- Each node represents an object of the compiled program, containing all
- the necessary information about it, such as its name, the object kind,
- possibly an address and its type. Object types itself may have a very
- complex structure and are further described using a Struct data
- structure, which again may refer to other Struct nodes [Wi85, Cre90].
- Hence objects and types are described with two different record
- structures:
-
- CONST
- (* object modes *)
- Undef* = 0; Scope = 1; Const* = 2; Type* = 3; Var* = 4; VarPar* = 5;
- XVar* = 6; IVar* = 7; Field* = 8; LProc* = 9; XProc* = 10; IProc* = 11;
- SProc* = 12; Mod* = 13;
-
- (* type forms *)
- Char* = 2; Bool* = 3; SInt* = 4; Int* = 5; LInt* = 6; Set* = 7;
- Real* = 8; LReal* = 9; Cmplx* = 10; LCmplx* = 11; NilTyp* = 13;
- NoTyp* = 14; Proc* = 15; String* = 16; Array* = 17; DynArr* = 18;
- Record* = 19; Pointer* = 20;
-
- TYPE
- Name* = ARRAY 32 OF CHAR;
- Object* = POINTER TO ObjectDesc;
- Struct* = POINTER TO StructDesc;
-
- ObjectDesc* = RECORD
- link*, next*: Object; (* objects are chained using next *)
- name*: Name;
- typ*: Struct;
- marked*: BOOLEAN; (* marked objects are exported *)
- mode*: INTEGER; (* identifies object kind *)
- mnolev*: INTEGER; (* module numbers are <= 0 *)
- ...
- (* object specific data *)
- END;
-
- StructDesc* = RECORD
- ref: INTEGER; (* used as mark field *)
- form*: INTEGER; (* identifies structure kind *)
- obj*: Object; (* points to type object if it exists *)
- len*, size*: LONGINT;
- base*: Struct; (* result-, element-, base- or pointee type *)
- link*: Object (* record scope *)
- END;
-
- Struct nodes describe the type of an object (array, record, pointer,
- procedure); a few types are predefined (e.g. Char, Integer, Real).
- Because types may be recursively defined, the resulting data structure
- may contain cycles. Many objects may have the same type and therefore
- refer to the same struct node.
-
- 2.2 Export
-
- The graph linearization algorithm in the form described in section 1
- actually works for different node kinds (determined by the tag field)
- but requires that all nodes are described using the same record.
- However, enforcing this precondition would have a far too strong impact
- on the structure of a compiler. As we have actually two different
- record types for object and type description, the adequate solution is
- to have two sets of read and write procedures, one set for objects and
- one for types. This distinction is even more justified by the fact that
- objects are stored in a simple linear list and are referenced only once
- (with the exception of type and module objects) while type descriptors
- may be referenced several times and potentially belong to cycles.
- References from struct nodes to their type objects are handled directly
- in the WriteStruct procedure. Modules are never marked for export and
- written using a special procedure. As a consequence, the write
- procedure which traverses the object scope does not have to take care of
- objects already traversed and therefore no marking is necessary. Hence,
- the write procedure degenerates to a simple list traversal while for the
- corresponding read procedure no temporary array is necessary. Using the
- above definitions, the WriteObjects procedure can be written as follows
- (primitive operations like WriteInt and WriteName may be found in the
- appendix section):
-
- PROCEDURE WriteObjects(obj: Object);
- BEGIN
- WHILE obj # NIL DO
- IF obj.marked THEN
- WriteInt(obj.mode);
- IF (obj.mode # Type) OR (obj.typ.obj # obj) THEN
- (* no-type or alias type *)
- WriteName(obj.name)
- ELSE (* other type *) Write(0X)
- END;
- WriteStruct(obj.typ);
- IF obj.mode = Const THEN (* write const value *)
- ELSIF obj.mode = LProc THEN (* write parameter list *)
- ...
- END
- END;
- obj := obj.next
- END;
- WriteInt(Undef) (* termination tag *)
- END WriteObjects;
-
- For exported objects, the mode which describes the object kind, the
- object's name and its type are written. As an optimization, the name of
- types is only written if it concerns alias types. Otherwise, the name
- will be written by the WriteStruct procedure (given below). Then,
- depending on the object mode, additional information is written (e.g.
- the value of constants or the parameter list of procedures).
-
- The WriteStruct procedure closely resembles the Write algorithm
- described in 1.2 but is slightly complicated by the fact that named
- types (i.e. types for which a type object exists) have to be handled
- correctly. Remember, that it is not correct to simply call the
- WriteObjects procedure, because WriteObjects is not built to handle more
- than one reference to an object. In addition, named types (and only
- these!) may be exported and imported over many modules and it must
- always be guaranteed that a type, whether it was imported via several
- modules ( indirect import ) or not, is described by one and only one
- struct node. A unique identification is necessary, which is the type
- name combined with the description of the module where the type was
- defined first. An analogous situation occurs when a type gets an alias
- name: then, the struct node always points to the first type object which
- defined the type.
-
- VAR
- nofStructs: INTEGER; (* structure number; initially = 1 *)
-
- PROCEDURE WriteStruct(typ: Struct);
- VAR name: Name;
- BEGIN
- IF typ = NIL THEN WriteInt(0)
- ELSIF typ.ref > 0 THEN (* already marked *) WriteInt(-typ.ref)
- ELSE (* first occurence *)
- WriteInt(typ.form); typ.ref := nofStructs; INC(nofStructs);
- IF typ.obj # NIL THEN (* named type *)
- name := typ.obj.name;
- IF ~typ.obj.marked THEN (* invisible type *)
- name[0] := CHR(ORD(name[0]) - ORD("@"))
- END;
- WriteName(name); WriteMod(GMod[-typ.obj.mnolev])
- ELSE Write(0X)
- END;
- CASE typ.form OF
- | Proc: (* write parameter list and result type *)
- | Array: (* write element type and length *)
- ...
- END
- END
- END WriteStruct;
-
- Note that for exported named types there is a distinction between those
- with and those without corresponding exported type objects, where the
- latter are called invisible types. Invisible types must not be visible
- in an importing module, i.e. a programmer is not allowed to use them by
- name within a declaration. On the other hand they have to be visible to
- the compiler because it must be ensured that all invisible types with
- the same name are mapped onto a common struct node. Hence, the same
- information as for named types is written, but a simple trick inhibits
- any use of the type name within a program: the first letter of its name
- (which is always greater or equal than "A") is modified such that the
- name becomes a syntactically invalid identifier.
-
- For named types the declaring module has to be known. Because several
- types may be exported by the same module, the corresponding module
- object may be referenced more than once. A special procedure which
- handles modules correctly is used:
-
- VAR
- nofLMods: INTEGER; (* local module number; initially = 0 *)
-
- PROCEDURE WriteMod(mod: Object);
- BEGIN
- IF mod.ref < 0 THEN (* first occurence *)
- mod.ref := nofLMods; INC(nofLMods);
- WriteInt(Mod); WriteKey(mod.key); WriteName(mod.name)
- ELSE WriteInt(-mod.ref)
- END
- END WriteMod;
-
- Like struct nodes, module descriptors are only written at their very
- first occurence. Whenever the same module is referenced later, only its
- reference number is written. Module pointers are never NIL, so numbering
- starts with zero and a module is marked iff mod.ref >= 0. During import
- the compiler has to check that only one version of a module is used
- globally, therefore every module needs a unique key. Remember that
- modules are never written out accidentally by the WriteObjects
- procedure, because they are never marked for export.
-
- Writing the complete symbol file requires writing out the module
- descriptor of the compiled module (by means of WriteMod) followed by
- writing out all its exported objects (by means of WriteObjects).
-
- 2.3 Import
-
- Like the general ReadNode procedure is mirroring the structure of the
- WriteNode procedure, the import procedures are similar to the export
- procedures. This fact allows for easy extension, because additionally
- written data in an export procedure automatically leads to the
- corresponding read operations in the import procedure. Nevertheless, a
- few difficulties need to be mastered anyway.
-
- In several situations an object may be imported more than once: the
- trivial case occurs when the same module is imported twice because of a
- substitution (or alias) name. One might expect that this special case
- should be handled by simply ignoring a second import; but actually a
- multiple import of objects has to be handled anyway, hence double import
- of entire modules is only a special case of a more general situation.
- Multiple import of an object normally occurs for type objects, when
- beeing imported indirectly across different modules. Consider the
- following situation: a module A exports a type (object) T which is used
- in a variable V of a module B. Module C which imports A and B
- consequentially also imports the type T from A and the variable V from B
- and hence once again the type T as type of V (Fig. 2).
-
- Fig. 2
-
- Double import
-
- However, multiply loading objects or structures, if not detected, could
- lead to incorrect incompatibilities during type checking. Therefore,
- each object which is read from the symbol file is discarded if it was
- already present in the symbol table. Hence, each object is represented
- by its very first loaded instance which is called primary instance
- [Gu85]. In consequence, the ReadObjects procedure reads an object from
- the symbol file but the procedure InsertImport inserts it in the
- corresponding module scope only if the object (with the same name) has
- not already been imported. As an optimization, note that only alias
- types have to be (additionally) inserted with InsertImport because their
- names differ from their type object names. All other types are handled
- in the ReadStruct procedure.
-
- PROCEDURE InsertImport(VAR obj: Object; scope: Object);
- VAR p, q: Object;
- BEGIN
- p := scope; q := scope.next;
- WHILE q # NIL DO
- IF q.name = obj.name THEN obj := q; RETURN END;
- p := q; q := q.next
- END;
- obj.mnolev := scope.mnolev; p.next := obj
- END InsertImport;
-
- PROCEDURE ReadObjects(scope: Object);
- VAR mode: LONGINT; obj: Object; typ: Struct; name: Name;
- BEGIN
- LOOP (* read all objects *)
- ReadInt(mode);
- IF mode = Undef THEN (* no more objects *) EXIT
- ELSIF mode = Type THEN
- ReadName(name);
- IF name # "" THEN (* alias type *)
- NewObject(obj, name, Type); ReadStruct(obj.typ);
- InsertImport(obj, scope)
- ELSE (* other types *) ReadStruct(typ)
- END
- ELSE
- ReadName(name); NewObject(obj, name, SHORT(mode));
- ReadStruct(obj.typ);
- IF mode = Const THEN (* read const value *)
- ELSIF obj.mode = LProc THEN (* read parameter list *)
- ...
- END;
- InsertImport(obj, scope)
- END
- END
- END ReadObjects;
-
- The procedure ReadStruct decides upon reading the first number form
- whether the structure was already read from the (same) symbol file or
- not. In the first case, the structure already built is taken out of a
- local structure table LStruct which serves as a translation table for
- structure references analogous to NodeTab in the ReadNode procedure. In
- the second case, the structure is created and inserted in the LStruct
- table, then the specific structure information is read. As described in
- ReadObjects, named types must also be ignored if occuring a second time:
- the structure information is read but then discarded (by means of
- InsertImport) and the primary instance is used (and also inserted in the
- LStruct table).
-
- As a tricky point, notice further: Because each type (named or not) may
- refer to itself, it is absolutely necessary that the primary instance is
- found before any other types are read (which potentially refer to the
- same type and therefore would obtain the wrong structure out of the
- LStruct table). Hence, no other types must occur between the type tag of
- a named type and its identification, i.e. its name and its original
- module description. This rule corresponds to postulate 5 in [Gu85].
-
- VAR
- nofStructs: INTEGER;
- (* structure number; initially = 1 *)
- LStruct: ARRAY MaxNofStructs OF Struct;
- (* local structure table; LStruct[0] = NIL *)
-
- PROCEDURE ReadStruct(VAR typ: Struct);
- VAR form: LONGINT; htyp: Struct; name: Name; obj, mod: Object;
- BEGIN
- ReadInt(form);
- IF form <= 0 THEN (* struct reference or NIL *) typ := LStruct[-form]
- ELSE (* first occurence *)
- NewStruct(htyp, SHORT(form)); ReadName(name);
- IF name # "" THEN (* named type *)
- NewObject(obj, name, Type); obj.marked := TRUE; obj.typ := htyp;
- htyp.obj := obj; ReadMod(mod); InsertImport(obj, mod.link);
- typ := obj.typ
- ELSE typ := htyp
- END;
- LStruct[nofStructs] := typ; INC(nofStructs);
- CASE form OF
- | Proc: (* read parameter list and result type *)
- | Array: (* read element type and length *)
- ...
- END
- END
- END ReadStruct;
-
- At the end, the ReadMod procedure is presented. As expected, a local
- module table LMod is used as translation table for module references. In
- addition, because modules must always be represented by their primary
- instances, a global module table GMod is necessary, which is accessed in
- the InsertMod procedure.
-
- VAR
- nofGMods*: INTEGER; (* global module number; initially = 0 *)
- GMod*: ARRAY MaxNofGMods OF Object; (* global module table *)
-
- PROCEDURE InsertMod*
- (VAR mod: Object; VAR name: ARRAY OF CHAR; key: LONGINT);
- VAR i: INTEGER;
- BEGIN
- i := 0;
- WHILE (i < nofGMods) & (name # GMod[i].name) DO INC(i) END;
- IF i < nofGMods THEN (* module already imported *)
- mod := GMod[i];
- IF mod.key # key THEN err(150) (* key inconsistency *) END
- ELSE
- NewObject(mod, name, Mod); (* must not be visible in global scope *)
- mod.key := key; mod.mnolev := -nofGMods;
- OpenScope(mod.link, mod.mnolev); CloseScope; (* allocate own scope *)
- GMod[nofGMods] := mod; INC(nofGMods)
- END
- END InsertMod;
-
- VAR
- nofLMods: INTEGER; (* local module number; initially = 0 *)
- LMod: ARRAY MaxNofLMods OF Object; (* local module table *)
-
- PROCEDURE ReadMod(VAR mod: Object);
- VAR ref, key: LONGINT; name: Name;
- BEGIN
- ReadInt(ref);
- IF ref > 0 THEN (* first occurence *)
- ReadKey(key); ReadName(name);
- InsertMod(mod, name, key);
- LMod[nofLMods] := mod; INC(nofLMods)
- ELSE mod := LMod[-ref]
- END
- END ReadMod;
-
- Importing a whole module requires reading the module (by means of
- ReadMod) followed by reading all exported objects of this module (by
- means of ReadObjects). Together there are only a few additional rules to
- the general graph algorithm to be observed:
-
- 1. For types and modules , which both may be referenced several times
- within the same symbol file , a marking scheme and a translation table
- analogous to the general graph read/write algorithms is used.
-
- 2. For named types and modules , which both may occur in several symbol
- files , the primary instance always must be used and inserted into the
- local translation tables. If the primary instance already exists, the
- remaining data must be read but then discarded. Repeated occurence of
- the same module is detected using a global module table, repeated
- occurence of the same named type is detected using its module and the
- corresponding module scope.
-
- 3. As a consequence of the second rule, no other types must occur in the
- symbol file before name and module of a named type are specified.
-
- 2.4 Implementation Aspects
-
- Symbol File Representation
-
- As mentioned in the beginning, symbol files need not to be represented
- by a separate file. Actually, a symbol file independent of the
- corresponding object file mirrors the fact that in Modula-2 and other
- languages the definition and the implementation of a module were
- compiled separately: the definition module was compiled into a symbol
- file and the implementation module into an object file. If exported
- objects are marked in some way within the implementation module (as in
- Oberon), a definition module and hence a separate symbol file is not
- necessary. It is more adequate to produce only a single file during
- compilation, which contains all the necessary information about the
- module including its interface description. For example, the
- (conventional) symbol file may be appended to the object file. The
- advantages are obvious: only a single file has to be distributed, this
- file is always self-contained and therefore consistent and it is
- possible to directly generate the interface specification out of an
- object file with a Browser tool. Should it be necessary to inhibit
- public use of a module (e.g. because it supports low-level features), it
- is easy to remove the symbol file from the object file and distribute
- the interface-less object file only.
-
- Canonical Form for Symbol Files
-
- When a module has to be recompiled for some reason without a change in
- its interface, the old module key should be used again in the symbol
- file. Otherwise all depending modules would also be invalidated and
- would have to be recompiled. A simple method to check whether a module's
- interface has changed or not is to bytewise compare the new symbol file
- against the old one. If it has not changed, the old key can be reused.
- This comparison method requires very stable symbol files: e.g. the
- ordering of the exported objects should have no effect on the symbol
- file. This is clearly not fulfilled in the implementation described
- above (which we've chosen for simplicity), because objects are ordered
- in the linear list according to their occurence in the module. Desired
- is a kind of canonical form for symbol files.
-
- As we mentioned earlier, access to a special object in the symbol table
- should be as efficient as possible, so one could implement the table
- using binary trees instead of (unsorted) linear lists. Then, the objects
- could be written in alphabetical order to the symbol file using an
- in-order traversal of the tree. Symbol files in this canonical form are
- invariant to any changes in the ordering of a module's objects.
- Unfortunately this implies an undesired side effect which destroys the
- efficiency of binary trees: Importing a symbol file means inserting the
- imported objects in binary trees. Because the objects are alphabetically
- sorted, the symbol table tree degenerates to a linear list. Further,
- most of the objects in typical modules are known by import. In
- consequence, searching in the symbol table means actually searching in
- linear lists. Hence, the additional implementation effort for simple
- binary trees may be no longer justified.
-
- The situation may be improved by modifing the proposed canonical form.
- Instead of writing all objects in alphabetical order, groups of objects
- of the same kind (with the same mode) are written alphabetically: first
- all constants, then the variables, then types, and so on. Within each
- group the objects are ordered and the group ordering is also specified.
- This is another canonical form which is also invariant against any
- permutations of exported objects. When imported, the trees will not
- completely degenerate but are decomposed into several partial lists. The
- result is at least a better balanced tree. Note that the addional effort
- of traversing the tree in several passes (for each object group) is only
- necessary during export; the more time critical import procedure is not
- at all affected.
-
- Predefined Types
-
- In several languages there exist predefined types which are known to the
- compiler. Such types are always exported using a fixed reference number.
- Before any object is imported, they are inserted into the local
- structure table by the import procedure (see Appendix B).
-
- Increased Import Speed
-
- As measurements show (see below), when reading is bytewise, the import
- speed highly depends on the symbol file length. Besides strings, most of
- the data on a symbol file are integers. Hence, a principal goal should
- be to reduce the space used to write a single integer number. Integers
- occur frequently as tag or reference numbers and most of them are very
- small (with an absolute value less than 64). Nevertheless also bigger
- integers should be managed easily. The ReadInt and WriteInt procedures
- described in [Te90] allow integers to be written in a
- machine-independent format which uses only one byte per number in most
- cases.
-
- 2.5 Measurements
-
- The following measurements show lengths and reading times of symbol
- files. The basic modules of the Oberon System are choosen as a typical
- "module mix". The method described here (used in module CCT) is compared
- to the method used in the existing portable Oberon compilers (module
- OPT, see [Cre90]) which is essentially the same method as described in
- [Gu85]. To be fair, the CCT method was adjusted such that about the same
- information per object was written out as with OPT (e.g. the address of
- each parameter of a procedure) but using a compactifying WriteInt
- procedure. The measurements were made on a Ceres-2 computer [He88] with
- a NS32532 CPU running at 25 Mhz clock speed.
-
- In the left table the lengths of the symbol files in bytes are shown
- (OPT lengths are 100%). On average, the CCT symbol files are about 25%
- shorter than the corresponding OPT files. When the same symbol files are
- written using a non-compactifying WriteInt procedure in CCT (for
- addresses only), the files are about 8 % shorter (measurements not shown
- here). The right table shows the reading times in ms for each symbol
- file (OPT times are 100%). For this, each file was imported 100 times
- and only the pure file reading time without directory operations was
- measured and the average taken. The actual time spent in the import
- procedures is much higher because of directory accesses and may vary
- even for the same file.
-
- Tab. 2
-
- Further analysis of the measurements shows what was presumed, namely a
- practically linear dependency between the file lengths and their reading
- times (Tab. 2); the linear regression coefficient r is nearly 1.0 for
- both methods. Although the average reading time per byte in CCT is
- larger than in OPT, the shorter file lengths made up for this
- difference. For comparison, the values for pure file reading are also
- shown (the reading time per byte is the average measured time for this
- file mix and is only valid for short files in general). In every case
- the file length completely dominates other factors, so to improve
- importing speed shorter symbol files have to be achieved. But
- nevertheless, such an effort is only justified if the file system offers
- some kind of caching strategy for files already open. Otherwise the
- reading time for such short files is negligible compared to the
- directory access time. The import and export procedures in CCT are about
- 20 % shorter in source code size than the OPT procedures (Tab. 2).
-
- 3. Summary
-
- As we have seen, the linearization algorithm for general graphs is very
- simple and easily adjusted to similar problems. In the example of symbol
- files it leads to a clean and understandable solution. The simplicity of
- the algorithm and the fact that only local invariants have to be
- considered, allows symbol files to be easily extended. The main
- difference between the algorithm described here and the one described in
- [Gu85] is that here a pre-order traversal instead of a post-order
- traversal is used. Using a post-order traversal, several postulates have
- to be guaranteed all the time, which complicate the algorithm.
- Nevertheless, cyclic references within types are a problem which has to
- be handled in a special way. Although the post-order algorithm requires
- no recursion for reading, the presented algorithm is faster in the
- average because symbol files are shorter and time used for reading in
- todays computers is determined mostly by the length of the files which
- are to be processed.
-
- Acknowledgements
-
- I would like to thank Josef Templ, Cuno Pfister, Clemens Szyperski and
- H. Mössenböck. Josef answered tricky questions about symbol files; he
- and Cuno worked as proof readers. H. Mössenböck added valuable comments
- to the modified algorithm in Appendix A. Not to forget, the entire paper
- was written using the excellent Write text editor of Clemens.
-
- References
-
- [Cre90] R. Crelier. OP2: A Portable Oberon Compiler . Computersysteme
- ETH Zürich, Technical Report No. 125, February 1990.
-
- [Gri90] R. Griesemer. Seneca - A Language for Numerical Computations on
- Vectorcomputers . Conpar 90 Proceedings, Volume on special technical
- contributions, Zürich, September 1990.
-
- [Gu85] J. Gutknecht. Compilation of Data Structures: A New Approach to
- Efficient Modula-2 Symbol Files . Computersysteme ETH Zürich, Technical
- Report No. 64, July 1985.
-
- [PfiHeTe91] C. Pfister, B. Heeb, J. Templ. Oberon Technical Notes .
- Companion paper.
-
- [ReMö86] P. Rechenberg, H. Mössenböck. An Algorithm for the Linear
- Storage of Dynamic Data Structures . Internal Paper, University of Linz,
- Austria, 1986.
-
- [Te90] J. Templ. SPARC-Oberon. User's Guide and Implementation.
- Computersysteme ETH Zürich, Technical Report No. 133, June 1990.
-
- [Wi85] N. Wirth. A Fast and Compact Compiler for Modula-2 .
- Computersysteme ETH Zürich, Technical Report No. 64, July 1985.
-
- [Wi88] N. Wirth. The Programming Language Oberon . Software - Practice
- and Experience, 18, 7, July 1988 and Computersysteme ETH Zürich,
- Technical Report No. 143, November 1990.
-
- Appendix A: A modified Linearization Algorithm
-
- When working with rather large graphs consisting of several thousands or
- even millions of nodes, it might be impractical to use a translation
- table NodeTab of about the same size as the graph for reading. If only
- multiple referenced nodes had to be stored in the NodeTab array, this
- translation table could be much smaller. This can be achieved by a split
- of the write phase into two subphases. As a desirable side effect, after
- writing, all preconditions for writing are established again, i.e. no
- unmarking is necessary. The modifications are described shortly:
-
- Writing :
-
- In the first pass all nodes are traversed and the mark field is used as
- reference counter (procedure Mark). For the mark field of every
- (reachable) node the following holds: In the second pass, every node
- that is referenced more than once (i.e. mark > 1) is written using a
- special tag and its negative node number is stored in its mark field
- (which now again fulfills the preconditions for writing).
-
- Reading:
-
- A negative node tag designates an already read node. A positive node tag
- specifies a node which occurs only once (tag is even) or a node which
- will be referenced again (tag is odd) and therefore must be stored in
- the NodeTab array.
-
- TYPE
- Node = POINTER TO NodeDesc;
- NodeDesc = RECORD
- mark: INTEGER; (* used for linearization; initially < 0 *)
- tag: INTEGER; (* determines node type; tag > 0 *)
- data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *)
- link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *)
- END;
-
- VAR
- NofNodes: INTEGER; (* node number; initially = 1 *)
- NodeTab: ARRAY MaxRefs OF Node;
- (* node table; contains each node referenced more than once *)
-
- PROCEDURE Mark(q: Node); (* Pass 1 *)
- (* precondition: q: q.mark < 0 *)
- VAR i: INTEGER;
- BEGIN
- IF q # NIL THEN
- IF q.mark < 0 THEN (* first occurence *) q.mark := 1;
- i := 0; WHILE i < Ntag DO Mark(q.link[i]); INC(i) END
- ELSE INC(q.mark)
- END
- END
- END Mark;
- (* postcondition: q: q is marked *)
-
- PROCEDURE WriteNode(q: Node); (* Pass 2 *)
- (* precondition: (q: q is marked) (NofNodes = 1) *)
- VAR i: INTEGER;
- BEGIN
- IF q = NIL THEN WriteInt(0)
- ELSIF q.mark < 0 THEN (* already marked *) WriteInt(q.mark)
- ELSE (* first occurence *)
- IF q.mark = 1 THEN (* node occurs only once *)
- WriteInt(q.tag*2); q.mark := -1
- ELSE (* node is referenced several times *)
- WriteInt(q.tag*2 + 1); q.mark := -NofNode; INC(NofNodes)
- END;
- i := 0; WHILE i < Mtag DO WriteInt(q.data[i]); INC(i) END;
- i := 0; WHILE i < Ntag DO WriteNode(q.link[i]); INC(i) END
- END
- END WriteNode;
- (* postcondition: q: q.mark < 0 *)
-
- PROCEDURE ReadNode(VAR q: Node);
- (* precondition: (NodeTab[0] = NIL) (NofNodes = 1) *)
- VAR tag, i: INTEGER;
- BEGIN
- ReadInt(tag);
- IF tag <= 0 THEN (* node reference *) q := NodeTab[-tag]
- ELSE (* first occurence *)
- NEW(q, tag DIV 2);
- IF ODD(tag) THEN (* node is referenced several times *)
- NodeTab[NofNodes] := q; INC(NofNodes)
- END;
- i := 0; WHILE i < Mtag DO ReadInt(q.data[i]); INC(i) END;
- i := 0; WHILE i < Ntag DO ReadNode(q.link[i]); INC(i) END
- END
- END ReadNode;
-
- Appendix B: Import / Export in Detail
-
- In the following an extract of a table handler with the complete import
- / export procedures is shown. A few points are specific to this
- implementation and hence commented accordingly.
-
- CONST
-
- (* implementation restrictions *)
-
- MaxNofGMods = 32; MaxNofLMods = 24; MaxNofStructs = 256;
-
- (* object modes *)
-
- Undef* = 0; Scope = 1; Const* = 2; Type* = 3; Var* = 4; VarPar* = 5;
- XVar* = 6; IVar* = 7; Field* = 8; LProc* = 9; XProc* = 10;
- IProc* = 11; SProc* = 12; Mod* = 13;
-
- (* type forms *)
-
- Char* = 2; Bool* = 3; SInt* = 4; Int* = 5; LInt* = 6; Set* = 7;
- Real* = 8; LReal* = 9; Cmplx* = 10; LCmplx* = 11; NilTyp* = 13;
- NoTyp* = 14; Proc* = 15; String* = 16; Array* = 17; DynArr* = 18;
- Record* = 19; Pointer* = 20;
-
- TYPE
-
- Name* = ARRAY 32 OF CHAR;
- Object* = POINTER TO ObjectDesc;
- Struct* = POINTER TO StructDesc;
-
- ObjectDesc* = RECORD
- link*, next*: Object; (* objects are chained using next *)
- name* : Name;
- typ* : Struct;
- marked* : BOOLEAN; (* marked objects are exported *)
- leave* : BOOLEAN;
- mode* : INTEGER; (* identifies object kind *)
- mnolev* : INTEGER; (* module numbers are <= 0 *)
- a0*, a1* : LONGINT; (* object specific data *)
- b0*, b1* : LONGINT (* object specific data *)
- END;
-
- StructDesc* = RECORD
- ref : INTEGER; (* used as mark field *)
- form* : INTEGER; (* identifies structure kind *)
- obj* : Object; (* points to type object if it exists *)
- len*, size*: LONGINT;
- base* : Struct; (* result-, element-, base- or pointee type *)
- link* : Object (* record scope *)
- END;
-
- VAR
- system, universe: Object; (* predefined scopes *)
- topScope*, undefObj*: Object; (* current topScope, error object *)
- firstStructRef: INTEGER;
- nofGMods*: INTEGER;
- GMod*: ARRAY MaxNofGMods OF Object; (* global module table *)
-
- (* predefined types *)
- undefTyp*, noTyp*, stringTyp*, boolTyp*, charTyp*, sIntTyp*, intTyp*,
- lIntTyp*, setTyp*, realTyp*, lRealTyp*, cmplxTyp*, lCmplxTyp*: Struct;
-
- PROCEDURE err(no: INTEGER);
- (* Displays an error message *)
-
- (* general table handling *)
-
- PROCEDURE NewObject*(VAR obj: Object; VAR name: ARRAY OF CHAR; mode: INTEGER);
- (* Creates a new object and initializes its fields *)
-
- PROCEDURE NewStruct*(VAR str: Struct; form: INTEGER);
- (* Creates a new structure and initializes its fields *)
-
- PROCEDURE OpenScope*(VAR scope: Object; mnolev: INTEGER);
- (* Opens a new scope if scope is NIL; otherwise the old scope is
- reopened *)
-
- PROCEDURE CloseScope*;
- (* Closes topScope *)
-
- PROCEDURE Insert*(VAR obj: Object; name: ARRAY OF CHAR; mode: INTEGER);
- VAR p, q: Object;
- BEGIN
- p := topScope; q := topScope.next;
- WHILE q # NIL DO
- IF q.name = name THEN err(1) END;
- p := q; q := q.next
- END;
- NewObject(obj, name, mode); obj.mnolev := topScope.mnolev; p.next := obj
- END Insert;
-
- (* import table handling *)
-
- PROCEDURE InsertMod*(VAR mod: Object; VAR name: ARRAY OF CHAR; key: LONGINT);
- VAR i: INTEGER;
- BEGIN
- i := 0;
- WHILE (i < nofGMods) & (name # GMod[i].name) DO INC(i) END;
- IF i < nofGMods THEN (* module already imported *)
- mod := GMod[i];
- IF mod.b0 # key THEN err(150) (* key inconsistency *) END
- ELSE
- NewObject(mod, name, Mod); (* must not be visible in global scope *)
- mod.b0 := key; mod.mnolev := -nofGMods;
- OpenScope(mod.link, mod.mnolev); CloseScope; (* allocate own scope *)
- IF nofGMods < MaxNofGMods THEN GMod[nofGMods] := mod; INC(nofGMods)
- ELSE err(227) (* to many imported modules *)
- END
- END
- END InsertMod;
-
- PROCEDURE InsertImport(VAR obj: Object; scope: Object);
- VAR p, q: Object;
- BEGIN
- p := scope; q := scope.next;
- WHILE q # NIL DO
- IF q.name = obj.name THEN obj := q; RETURN END;
- p := q; q := q.next
- END;
- obj.mnolev := scope.mnolev; p.next := obj
- END InsertImport;
-
- (* import *)
-
- PROCEDURE OpenRider(VAR R: Files.Rider; name: ARRAY OF CHAR; VAR res: INTEGER);
- (* Sets a rider to the beginning of the symbol file *)
-
- PROCEDURE Import*(VAR substName, impName, selfName: ARRAY OF CHAR);
- VAR
- R: Files.Rider;
- mod0, mod: Object;
- res, nofLMods, nofStructs: INTEGER;
- LMod: ARRAY MaxNofLMods OF Object;
- LStruct: ARRAY MaxNofStructs OF Struct;
-
- PROCEDURE^ ReadStruct(VAR typ: Struct);
-
- PROCEDURE ReadInt(VAR i: LONGINT);
- (* Reads integers written in a compacted form [Te90] *)
-
- VAR n: LONGINT; s: INTEGER; x: CHAR;
-
- BEGIN
- s := 0; n := 0; Files.Read(R, x);
- WHILE ORD(x) >= 128 DO
- INC(n, ASH(ORD(x) - 128, s)); INC(s, 7); Files.Read(R, x)
- END;
- i := n + ASH(ORD(x) MOD 64 - ORD(x) DIV 64 * 64, s)
- END ReadInt;
-
- PROCEDURE ReadName(VAR name: ARRAY OF CHAR);
- (* Reads a name terminated with 0X *)
-
- PROCEDURE ReadString(VAR pos, len: LONGINT);
- (* Reads a string terminated with 0X into the scanner string buffer *)
-
- PROCEDURE ReadKey(VAR x: LONGINT);
- (* Reads an integer in uncompacted form *)
-
- PROCEDURE ReadMod(VAR mod: Object);
- VAR ref, key: LONGINT; name: Name;
- BEGIN
- ReadInt(ref);
- IF ref > 0 THEN (* first occurence *)
- ReadKey(key); ReadName(name);
- IF name = selfName THEN err(49) END;
- InsertMod(mod, name, key);
- IF nofLMods < MaxNofLMods THEN LMod[nofLMods] := mod; INC(nofLMods)
- ELSE err(227) (* to many imported modules *)
- END
- ELSE mod := LMod[-ref]
- END
- END ReadMod;
-
- PROCEDURE ReadFields(VAR field: Object);
- VAR obj, last: Object; name: Name;
- BEGIN
- field := NIL;
- LOOP
- ReadName(name);
- IF name = "" THEN EXIT END;
- NewObject(obj, name, Field); ReadStruct(obj.typ); ReadInt(obj.a0);
- IF field = NIL THEN field := obj ELSE last.next := obj END;
- last := obj
- END
- END ReadFields;
-
- PROCEDURE ReadFP(VAR par: Object; VAR nofArgs: LONGINT);
- VAR obj, last: Object; typ: Struct; n, a0, a1: LONGINT; name: Name;
- BEGIN
- par := NIL; ReadInt(nofArgs); n := nofArgs;
- WHILE n > 0 DO
- ReadName(name); ReadStruct(typ); ReadInt(a0); ReadInt(a1);
- IF ODD(a1) THEN NewObject(obj, name, VarPar)
- ELSE NewObject(obj, name, Var)
- END;
- obj.typ := typ; obj.a0 := a0; obj.a1 := a1 DIV 2;
- IF par = NIL THEN par := obj ELSE last.next := obj END;
- last := obj; DEC(n)
- END
- END ReadFP;
-
- PROCEDURE ReadStruct(VAR typ: Struct);
- VAR form: LONGINT; htyp: Struct; name: Name; obj, mod: Object;
- BEGIN
- ReadInt(form);
- IF form <= 0 THEN (* struct reference or NIL *) typ := LStruct[-form]
- ELSE (* first occurence *)
- NewStruct(htyp, SHORT(form)); ReadName(name);
- IF name # "" THEN (* named type *)
- NewObject(obj, name, Type); obj.marked := TRUE; obj.typ := htyp;
- htyp.obj := obj; ReadMod(mod); InsertImport(obj, mod.link);
- typ := obj.typ
- ELSE typ := htyp
- END;
- IF nofStructs < MaxNofStructs THEN
- LStruct[nofStructs] := typ; INC(nofStructs)
- ELSE err(228) (* to many imported types *)
- END;
- ReadStruct(htyp.base);
- CASE form OF
- | Proc:
- OpenScope(htyp.link, 0); ReadFP(htyp.link.next, htyp.len);
- CloseScope; htyp.size := 1
- | Array: ReadInt(htyp.len); ReadInt(htyp.size)
- | DynArr: ReadInt(htyp.size)
- | Record:
- OpenScope(htyp.link, 0); ReadFields(htyp.link.next); CloseScope;
- ReadInt(htyp.size);
- IF htyp.base # NIL THEN
- htyp.len := htyp.base.len+1; htyp.link.link := htyp.base.link
- ELSE htyp.len := 0
- END
- | Pointer: typ.size := 1
- END
- END
- END ReadStruct;
-
- PROCEDURE ReadObjects(scope: Object);
- VAR mode: LONGINT; obj: Object; typ: Struct; name: Name;
- BEGIN
- LOOP (* read all objects *)
- ReadInt(mode);
- IF mode = Undef THEN EXIT
- ELSIF mode = Type THEN
- ReadName(name);
- IF name # "" THEN (* alias type *)
- NewObject(obj, name, Type); ReadStruct(obj.typ);
- InsertImport(obj, scope)
- ELSE (* other types *) ReadStruct(typ)
- END
- ELSE
- ReadName(name); NewObject(obj, name, SHORT(mode));
- ReadStruct(obj.typ);
- IF mode = Const THEN
- IF obj.typ.form = String THEN ReadString(obj.b0, obj.b1)
- ELSIF obj.typ.form = LReal THEN ReadInt(obj.b0); ReadInt(obj.b1)
- ELSE ReadInt(obj.b0)
- END
- ELSIF obj.mode = Var THEN ReadInt(obj.a0); obj.mode := XVar
- ELSIF obj.mode = LProc THEN
- OpenScope(obj.link, 0); ReadFP(obj.link.next, obj.b0); CloseScope;
- obj.mode := XProc
- ELSIF obj.mode = IProc THEN
- OpenScope(obj.link, 0); ReadFP(obj.link.next, obj.b0); CloseScope
- END;
- InsertImport(obj, scope)
- END
- END
- END ReadObjects;
-
- PROCEDURE EnterStruct(typ: Struct);
- (* Enters a predefined type into the LStruct table *)
-
- BEGIN
- IF impName = "SYSTEM" THEN Insert(mod, impName, Mod); mod.link := system
- ELSE OpenRider(R, impName, res);
- IF res < 0 THEN (*
- no error occured
- *)
- LStruct[0] := NIL; EnterStruct(stringTyp);
- EnterStruct(undefTyp); EnterStruct(noTyp); EnterStruct(boolTyp); EnterStruct(charTyp);
- EnterStruct(sIntTyp); EnterStruct(intTyp); EnterStruct(lIntTyp); EnterStruct(setTyp);
- EnterStruct(realTyp); EnterStruct(lRealTyp); EnterStruct(cmplxTyp); EnterStruct(lCmplxTyp);
- nofLMods := 0; nofStructs := firstStructRef;
- ReadMod(mod0); ReadObjects(mod0.link);
- Insert(mod, substName, Mod);
- mod.b0 := mod0.b0; mod.mnolev := mod0.mnolev; mod.link := mod0.link
- ELSE err(res)
- END
- END
- END Import;
-
- (* export *)
-
- PROCEDURE Export*
- (mod: Object; VAR buf: ARRAY OF CHAR; VAR len: LONGINT; VAR new: BOOLEAN);
-
- VAR pos: LONGINT; nofLMods, nofStructs: INTEGER;
-
- PROCEDURE^ WriteStruct(typ: Struct);
-
- PROCEDURE Write(ch: CHAR);
- BEGIN buf[pos] := ch; INC(pos)
- END Write;
-
- PROCEDURE WriteInt(i: LONGINT);
- (* Writes integers in a compacted form [Te90] *)
- BEGIN
- WHILE (i < -64) OR (i > 63) DO
- Write(CHR(i MOD 128 + 128)); i := i DIV 128
- END;
- Write(CHR(i MOD 128))
- END WriteInt;
-
- PROCEDURE WriteName(VAR name: ARRAY OF CHAR);
- (* Writes a name terminated with 0X *)
-
- PROCEDURE WriteString(pos: LONGINT);
- (* Writes the string terminated with 0X at position pos of the scanner
- string buffer *)
-
- PROCEDURE WriteKey(x: LONGINT);
- (* Writes an integer in uncompacted form *)
-
- PROCEDURE WriteMod(mod: Object);
- BEGIN
- IF mod.b1 < 0 THEN (* first occurence *)
- mod.b1 := nofLMods; INC(nofLMods);
- WriteInt(Mod); WriteKey(mod.b0); WriteName(mod.name)
- ELSE WriteInt(-mod.b1)
- END
- END WriteMod;
-
- PROCEDURE WriteFields(field: Object);
- BEGIN
- WHILE field # NIL DO
- IF field.marked THEN
- WriteName(field.name); WriteStruct(field.typ); WriteInt(field.a0)
- END;
- field := field.next
- END;
- Write(0X)
- END WriteFields;
-
- PROCEDURE WriteFP(par: Object; nofArgs: LONGINT);
- BEGIN
- WriteInt(nofArgs);
- WHILE nofArgs > 0 DO
- WriteName(par.name); WriteStruct(par.typ); WriteInt(par.a0);
- IF par.mode = VarPar THEN WriteInt(par.a1*2 + 1)
- ELSE WriteInt(par.a1*2) END;
- par := par.next; DEC(nofArgs)
- END
- END WriteFP;
-
- PROCEDURE WriteStruct(typ: Struct);
- VAR name: Name;
- BEGIN
- IF typ = NIL THEN WriteInt(0)
- ELSIF typ.ref > 0 THEN (* already marked *) WriteInt(-typ.ref)
- ELSE (* first occurence *)
- WriteInt(typ.form); typ.ref := nofStructs; INC(nofStructs);
- IF typ.obj # NIL THEN (* named type *)
- name := typ.obj.name;
- IF ~typ.obj.marked THEN (* invisible type *)
- name[0] := CHR(ORD(name[0]) - ORD("@"))
- END;
- WriteName(name); WriteMod(GMod[-typ.obj.mnolev])
- ELSE Write(0X)
- END;
- WriteStruct(typ.base);
- CASE typ.form OF
- | Proc: WriteFP(typ.link.next, typ.len)
- | Array: WriteInt(typ.len); WriteInt(typ.size)
- | DynArr: WriteInt(typ.size)
- | Record: WriteFields(typ.link.next); WriteInt(typ.size)
- | Pointer:
- END
- END
- END WriteStruct;
-
- PROCEDURE WriteObjects(obj: Object);
- BEGIN
- WHILE obj # NIL DO
- IF obj.marked THEN
- WriteInt(obj.mode);
- IF (obj.mode # Type) OR (obj.typ.obj # obj) THEN
- (* no-type or alias type *) WriteName(obj.name)
- ELSE (* other type *) Write(0X)
- END;
- WriteStruct(obj.typ);
- IF obj.mode = Const THEN
- IF obj.typ.form = String THEN WriteString(obj.b0)
- ELSIF obj.typ.form = LReal THEN WriteInt(obj.b0); WriteInt(obj.b1)
- ELSE WriteInt(obj.b0)
- END
- ELSIF obj.mode = Var THEN WriteInt(obj.a0)
- ELSIF obj.mode IN {LProc, IProc} THEN WriteFP(obj.link.next, obj.b0)
- END
- END;
- obj := obj.next
- END;
- WriteInt(Undef) (* termination tag *)
- END WriteObjects;
-
- PROCEDURE Compare(VAR buf: ARRAY OF CHAR; len: LONGINT; VAR new: BOOLEAN);
- VAR res: INTEGER; i: LONGINT; R: Files.Rider; ch: CHAR;
- prefix: ARRAY 5 OF CHAR;
- BEGIN
- OpenRider(R, mod.name, res); new := TRUE;
- IF res < 0 THEN
- Files.ReadBytes(R, prefix, LEN(prefix)); Files.Read(R, ch);
- i := LEN(prefix);
- WHILE (i < len) & (buf[i] = ch) DO Files.Read(R, ch); INC(i) END;
- IF i = len THEN (* same symbol table => use old key *)
- i := 0; WHILE i < LEN(prefix) DO buf[i] := prefix[i]; INC(i) END;
- new := FALSE
- END
- END
- END Compare;
-
- BEGIN
- pos := 0; nofLMods := 0; nofStructs := firstStructRef;
- WriteMod(mod); WriteObjects(mod.link.next);
- Compare(buf, pos, new)
- END Export;
-
-